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ABSTRACT 

In this paper, we provide the solution for physically challenged people like hearing impaired people .It is a 
smart application works for the hearing impaired people. People with hearing loss have move through the 
activities of daily living at home, at work and in business situations. People may face the difficulty in hearing 
the environment sounds and identifying the sounds. Our main contribution for the hearing impaired people is 
to make them understand the type of sounds which is useful to them. The conventional sound recognition 
techniques are not directly applicable since the background noise and reverberations are high which leads to 
low performance. A deep neural network which is capable of classifying and predicting the information from 
unstructured data such as image, text or sounds that makes the machine to get the environmental sounds. In 
this paper, a deep learning algorithm called CNN(convolution neural network) which classify the sound audio 
clips. This model will results the accuracy of 80% which is higher than the conventional technique .It achieves 
good results comparable to other approaches. 

Keywords- deep learning; convolution neural network; feature extraction; sound recognition; sound event 
classification. 


LINTRODUCTION 


Sound plays a vital role in everyone’s life by 
sharing the information with other people. 
Hearing impaired people will feel difficult to 
understand the sound with background 
noise. For example, Whoopi Goldberg, a 
comic writer with hearing loss discovered a 
portable listening platform for children to 
prevent from the hearing loss. Sound 
recognition is the only solution which guide 
the hearing impaired people to overcome the 
struggles of hearing the sound. 


This Smart app is used for physically 
challenged people like deaf and dumb 
people using internet of things and deep 
learning algorithm.The main aim is to give 
an alarm in emergency situation for 


physically challenged people .Alarm is 
nothing but it is a silent notification for the 
user. For example, the deaf and dumb 
people who is working in front of desktop or 
personal computer, they may not be aware 
of their surroundings and environment. 


This smart app gives an alarm/notification to 
the user through the desktop/personal 
computer. If a fire accident happens, then 
the fire alarm will be on, the application will 
detect the fire alarm sound and give the 
notification as fire alarm is detected. By 
using this notification, the physically 
challenged people will prevent themselves 
from the emergency situation. The urban 
sound 8K datasets is used . 


The CNN algorithm will detect the sound 
and then it will analyze the type of sound. 
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After analyzing the sound, it will recognize 
the sound and gives a notification to the 
user. Within a given area, the system will be 
able to detect the type of sound. It will get 
the sound using hardware implementation 
using Internet of Things. After getting the 
sound based on deep learning, train the data 
sets. So based on the dataset it will analyze 
the type of sound and notify it to the user via 
notification. It will produce accurate result. 


ILMETHODOLOGY 

The detection and classification of sound is 
a multilevel process: Sound dataset, Audio 
preprocessing, Feature extraction, CNN 


classification model, Hardware 
implementation, Sound classification, 
Notification. 


Sound Datasets: In this paper the urban 
sound 8K dataset is used .It contains 8732 
recording samples with 10 classes of 
different sound sources such as: dog bark, 
car horn, siren etc.Most of the above sound 
samples have duration of 5-6 seconds. But 
some of the sound samples can be as short 
as 3 seconds. The urbansound8K are 
subdivided into 10 subgroups for 10 fold 
cross validation. The 8732 sound excerpts 
are cropped from a smaller number of longer 
recordings using librosa and it can result in 
optimistic results. The sub divided 10 folds 
will avoid the issues and make the results 
accurate. The urban sound dataset contains 
.wav format files. To convert the .wav files 
into matrix of numbers (10X10) matrix [10 
sound files] using the python sound file 
library. 
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Figure. | 
Audio preprocessing:To represent audio 
clip in .wav files the data need to be 
preprocessed. For audio processing the 
librosa library provide useful functionalities. 
Using librosa, audio files are loaded into a 
numpy array which consists of amplitudes of 
the corresponding audio clip. Theses 
amplitudes are called as sampling rate which 
is usually 22050Hz or 44100Hz.After 
loading the audio clip into an array, 
noisiness should be removed. To suppress 
the noise in the audio, spectral gating 
algorithm is used which is implemented 
using noise reduce python library. In 
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resulting audio clip the leading and trailing 
parts are trimmed to get the noise reduced 
audio file. 


ENE ME T 


Figure.2 


Feature Extraction: In general, without 
feature extraction, the audio data is not 
understandable by the classification modal. 
To make it understand extract the audio 
features from the preprocessed audio 
clips.The absolute values of Short Time 
Fourier Transform (STFT) will be extracted 
from each audio clip. 

1. To calculate STFT, find the window size 
of STFT using FFT window size. 
2.According to the equation, 
n_stft=n_fft/2+1where n_stft is short 
fourier transform window size and n_fft is 
fourier transform window size from which 
STFT frequency binsf is calculated. 

3. Consider t no of audio samples, f no of 
frequency bins in the STFT, h is the 
windows hop length and w is the window 
length. 

4. Number of windows is calculated using 
1+ (t-w)/h. 

5. For each window,the amplitude of 
frequency bins ranges from 0 to sampling 
_rate/2 which is stored as a 2D array. 

6. This 2D array is normalized to get a 
common loudness level. 


CNN Classification Model:From the 
feature extracted, the modal is trained to 
classify the audio events. Hence CNN 


classification modal is used. For 
classification, convert the data into 
spectrogram, for which librosa is utilized. 
After plotting and building a spectrogram of 
data, implement the two layer neural 
network which consists of input layer and 
output layer. The weights are defined by the 
hidden layer.It is used for mapping between 
the input and the output layer .The last layer 
of the neural network is the softmax layer is 
in the output layer from which the 
probability distribution for the audio clip is 
identified. This paper shows that CNN is 
used to classify the sound clips to the greater 


accuracy to predict the sound 
CNN Confusion Map 
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Figure.3 


Hardware Implementation: The hardware 
implementation is used for getting the 
external sound. The hardware consists of 
raspberry pi3 and microphone. 


Raspberry Pi3:A raspberry pi 3 is an 
electronic board where it has the special 
features of recording the particular sound 
and stores it in memory card when the Wi-Fi 
is off. If the Wi-Fi is on,it will store the data 
in cloud. The raspberry pi gives the high 
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level performance when it is compared to 
arduino. 


Sound classification: The sound is then 
processed by the classification part of the 
CNN algorithm and it will analyze the type 
of sound and then it will notify to the user 
via notification. 


Notification:It will send the notification to 
the user through desktop or personal 
computer when particular sound is detected. 
It will be useful for hearing impaired people. 
II.LITERATURE SURVEY 


There are different methods of the sound 
classification using Arduino and various 
machine learning techniques. 


Arushi Singh,L.Ezhilarasi et al[l]have 
proposed a system which uses sound sensors 
to detect the sound and then transmit the 
sound data. They focus on sound pollution 
monitoring system with sensing the sound 
using raspberry Pi . A raspberry Pi module 
interacts with the sensors and processes the 
sound data and sends the alert to the 
application. It can detect and monitor sound 
pollution levels using IOT which further 
send a mail or SMS to the system. In future 
work, they planned to implement this 
concept in the method of machine learning. 


Baker Fleurys et al [2] have proposed a 
system to perform certain task by 
recognizing the sound in home automation 
system. The system will generate poor 
recognition results, when the noisy sounds 
mixed with the target sound due to 
occurrence of the sound simultaneously. To 


solve this problem, this paper produces a 
framework. The framework consists of the 
sound verification and the sound separation 
techniques based on wireless based sensor 
networks. The applications of Wireless 
Sensor Network are home automation 
system, security system, power 
management. 


L. Korhonen, J. Pakka have proposed the 
health care for the aged persons. The 
monitoring can be done using the 
telecommunication in the hospitals to reduce 
the cost of hospitalization and also to 
improve the comfort of the patients in the 
hospital. In this paper the sound is classified 
and detected in a noisy environment using 
sound surveillance. There are two stages: 
one detects the sound and extract the sound 
from a signal flow. The sound classification 
is the second stage is to identify the 
unknown sounds and then it will be 
validated. The future aim is to find the 
fusion between the classification and the 
analysis of the sound in a medical 
telemonitoring system. 


Lin Goldman et al[4] have proposed this 
system for sensing the human behavior 
using microphone which reveals the key 
information. The person’s behavior is 
produced by the key information. The 
microphone will be present in the modern 
smart phones or the laptop or pc. The 
approach used for sensing the sound is 
supervised learning which is the part of the 
machine learning. The problem in this 
system is time consuming and it is restricted 
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to the various types of sounds. it explored 
the general sound classification without 
using explicit supervised training in the 
smartphone. The supervised techniques will 
produce better results compared to the fully 
supervised DS. 


F.Pachet and D.Cazaly et al [5] have a 
proposed the system for classification of 
audio signals. The characterization shared 
by its members is music genre. The 
characterization related to the instrument, 
structure of the rhythm and the music of the 
harmonic content. The music genres are 
explored in a hierarchy from the automatic 
classification of the audio signals. The 
timbral texture, rhythmic content and pitch 
content are the three important feature sets 
have been proposed. The audio classification 
is done here. The techniques include in the 
audio classification are music and non- 
music sounds. The audio signals are 
classified into the music, speech and 
laughter.And it also detects the 
environmental sounds .The melody 
extraction is a hard problem which is not 
solved for general audio using from 
imperfect melody extraction algorithms. The 
pattern recognition, sound recognition is 
contained in the C++ server. 


N.Morgan, G.Dahl et al [6] have proposed 
the system for speech recognition using 
convolution neural network and hybrid 
neural network .Hidden Markov model is 
used to increase the accuracy of speech 
recognition. Some types of speech 
variability is accommodated by the 


convolution neural network structure. It will 
recognize the speech alone and it is not 
recognized in the phone. The CNN can be 
pre trained to improve the performance of 
the recognition. The speech is recognized in 
the method of the speech datasets. 


Y.Peng,C.Lin et al [7] have proposed the 
system for sound event classification using 
semi supervised learning. In this paper, they 
mainly focus on sound classification in a 
large datasets., manually labeling the sound 
data is expensive, so the large amount of sound 
data will be available to public. It makes use of 
the semi supervised learning in the approach of 
the sound analysis. The future work will 
focus on large unlabeled datasets to detect, 
analyze and to classify what the sound is. 


B.Najali,N.Noury etal [8] has proposed the 
system for flats to recognize the sound and 
speech. They placed eight microphones to 
detect the sound which is trained the 
environmental sounds. it will automatically 
analyze and sort the different types of 
sounds recorded in the flat. This paper 
produces a complete sound and speech 
recognition using unsupervised learning real 
time conditions. After testing the event, it 
will produce the results for sound 
recognition is good and produce the accurate 
results. 


Y. LeCun, L. Bottou, Y. Bengio et al [9] 
have proposed this for environmental sound 
classification using convolution neural 
network. Based on the 3 public data sets of 
environmental and urban sounds are 
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evaluated and produce the accuracy of the 
sound. For significant progress in numerous 
pattern recognition tasks, convolution neural 
networks are used. The size of the datasets 
influences the performances of the 
supervised deep model. Increase in the size 
of the dataset will improve the performance 
of the trained models. it will detect the 
environmental sound using machine learning 
algorithm which uses convolution neural 
network. 


J. G. Ryan and R. A. Goubran et al [10] 
have proposed a system for detecting 
different type of sounds in a given area and 
determining both the location and type of 
the sound. For nonspeech audio segments, 
additional features are computed to perform 
audio classification, which determines the 
nature of the sound (e.g., wind noise, 
closing, footsteps ,door opening or closing, 
fan noise).it is capable of working in a 
signal to noise ratio (SNR) and degradation 
environment is done by using speech/non 
speech algorithm.This paper proposed a 
security monitoring instrument that can 
classify and detect the nature and location of 
different sounds in a room. 


IV.CONCLUSION AND 
WORK 

A summary of the performance of the sound 
detection and classifications of sound in the 
system is evaluated. From the smart web 


FUTURE 


application for physically challenged people, 
the sound event classification and detection 


is proposed using internet of things and 
machine learning algorithm. 

In future work, this web application can be 
made as a mobile application. Because 
compared to web app, usage of mobile app 
is more and it will work efficiently in mobile 
application and it can also be done using 
various deep learning algorithm. 
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